Parallel Sequence Mining on Shared-Memory Machines
نویسنده
چکیده
We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each class can be solved independently on each processor requiring no synchronization. However, dynamic inter-class and intra-class load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.
منابع مشابه
Scalable Data Mining for Rules
Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...
متن کاملShared Memory Parallelization of Data
With the availability of large datasets in application areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advances have made shared memory parallel machines commonly available to organizations and...
متن کاملCompiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms
Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of da...
متن کاملMulti-Objective Unrelated Parallel Machines Scheduling with Sequence-Dependent Setup Times and Precedence Constraints
This paper presents a novel, multi-objective model of a parallel machines scheduling problem that minimizes the number of tardy jobs and total completion time of all jobs. In this model, machines are considered as unrelated parallel units with different speeds. In addition, there is some precedence, relating the jobs with non-identical due dates and their ready times. Sequence-dependent setup t...
متن کاملWHAT GOOD ARE SHARED-MEMORY MODELS? - Parallel Processing, 1996. Proceedings of the 1996 ICPP Workshop on Challenges for
Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular messagepassing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they may have had. In this invited position papel; we discuss the continuing importance of shared memory ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 61 شماره
صفحات -
تاریخ انتشار 1999